vlt 1
AInstantaneous Regret Bound Conditioned on the event that (8) in Lemma 1 holds (with probability 1 δ), it follows that
As wt is selected as an LV w.r.t. From (18), (19), and (20), we obtain (9), (10), and (11), respectively. C1TβTγT (21) where C1 = 8/log(1 + σ 2n) and γT is the maximum information gain about f that can be obtained by observing any set of T observations. By selecting xtas a sample from the posterior belief of x given yDt 1, it is noted that the distribution of xt and x are the same, i.e., p(xt|yDt 1) = p(x |yDt 1). We prove the following Lemma 4 which is then used to prove Lemma 5. Lemma 3 follows from Lemma 5. Lemma 5. Consider a realization f1 of the black-box function f following the GP posterior belief Thus, Lemma 5 implies Lemma 3. Let us consider CV-TS with a batch query of size k at each iteration.
Optimizing Conditional Value-At-Risk of Black-Box Functions
This paper presents two Bayesian optimization (BO) algorithms with theoretical performance guarantee to maximize the conditional value-at-risk (CVaR) of a black-box function: CV-UCB and CV-TS which are based on the well-established principle of optimism in the face of uncertainty and Thompson sampling, respectively. To achieve this, we develop an upper confidence bound of CVaR and prove the no-regret guarantee of CV-UCB by utilizing an interesting connection between CVaR and value-at-risk (VaR). For CV-TS, though it is straightforwardly performed with Thompson sampling, bounding its Bayesian regret is non-trivial because it requires a tail expectation bound for the distribution of CVaR of a black-box function, which has not been shown in the literature. The performances of both CV-UCB and CV-TS are empirically evaluated in optimizing CVaR of synthetic benchmark functions and simulated real-world optimization problems.
219ece62fae865562d4510ea501cf349-Supplemental.pdf
If there are multiple LVs, we select the LV with the maximum probabilityp(W). It is a heuristic to improve the empirical performancesuggestedby[13]. The simulated robot pushing experiment is taken from [23]. The simulation returns the location of apushed object given the robot'slocation and the pushing duration, i.e.,x. The portfolio optimization problem is taken from [4].